Optimization Methods

There are several methods available to perform optimizations. The following provides a brief overview of the approach the methods use and, where appropriate, references to published material related to the methods. Options specific to the different methods are also described here.

Set up an Outcome Optimization by first Defining Outcome Payoffs and then setting up parameters to be search over using the Optimization Specs. It is in this panel that you select a method to use

Powell

Powell is an efficient conjugate gradient search as described in: Powell, M. J. D. (June 2009). The BOBYQA algorithm for bound constrained optimization without derivatives (Report). Department of Applied Mathematics and Theoretical Physics, Cambridge University (www.damtp.cam.ac.uk/user/na/NA_papers/NA2009_06.pdf retrieved September, 2017). It is a robust approach to finding the maximum value of the payoff in a small number of simulations for payoff surfaces that are reasonably well behaved. It is a local search technique, and may miss global optima for payoff surfaces that have multiple maxima or large areas of unchanging payoff values.

The additional global options for Powell are:

Tolerance is amount of a parameter change that is considered significant. This defaults to 10E-5 which is reasonable for parameters that are in the 1 to 100 range. Larger tolerance values will tend to speed convergence at the cost of precision.

Initial Step is the approximate expected size of change that should be made in the parameters. It is 1 by default. Use larger values if the parameters are large number, smaller if they are fractional numbers. This influences the efficiency of the early search process.

Max Sims specifies the maximum number of simulations that will be executed before the method will stop if it has not yet converged. This is a backstop to prevent excessive computation. Use -1 to not constrain the number of simulations. If you have specified additional starts each start will use up to this number of simulations.

Stop on overflow, if checked, will stop an optimization when a divide by 0 or overflow occurs and then ask you if you want to run with the parameters generating the overflow. This is a good way to debug optimization problems, but will not record the best parameters found so far.

The additional Per Parameter option for Powell is:

Scaling is the amount by which the parameter should be multiplied before applying the global tolerance and initial step criteria. Use this when some parameters are much larger in scale than others. For example, if all parameters were fractions except one that was expected to be in the millions, you would apply a scaling of 1e-6 to the parameter in the millions. If all parameters are of the same magnitude, the default value of 1.0 will be fine.

Grid

Grid provides a mechanism for methodically searching parameters across their ranges. It works by changing each parameter across a sequence of values. The total number of searches performed in a grid search is the product of the number of values searched for each parameter. If there are many parameters, this can be a very big number, so there are options to make the grid search more contained, or more progressive.

Grid ignores initial parameter values (and therefore does not ever perform additional starts) except when Sequential is selected as the search type.

The additional global option for Grid is:

Search Type: lets you choose between different ways to execute the grid search.

Exhaustive goes through all possible combinations. The number of searches required is the product of the number of steps for each of the parameters which can be a very big number.

Bisecting starts from the min and max for each parameter and then does the middle point of that, then the middle points of the middle points and so on. When bisection would give a smaller interval for a parameter than the parameter's tolerance, the bisecting for that parameter stops. Bisecting has the advantage of testing extremes early, and then refining the evaluations. It can be used with large tolerances on some parameters to quicken the search, or simply canceled to review results.

Sequential goes through the parameters in sequence, instead of all at once. The first parameter is searched with the others left at their model values. The best value from the first is then used and the second parameter is searched and so on. The total number of simulations is thus the sum, rather than the product, of the number for each parameter. Sequential works well when there is strong independence among the parameter effects (for example, as occurs if they are additive to the payoff).

Latin Hypercube goes through all values by only testing each one a single time along each dimension. It can be useful when there are one or two parameters that contribute most strongly to the payoff, but these parameters are not known in advance.

The additional per parameter options for Grid are:

Number of steps is a positive integer between 2 and 101.This is the number of steps between min and max that will be taken for the parameter. The default is 11.

Tolerance is a positive number (bigger than 1e-12). It is used only if Bisecting is selected and determines when the bisecting process will stop for this parameter.

Differential Evolution

Differential evolution (DE) is a population-based evolutionary method for real-valued global optimization that creates new solutions using rapidly computed differences in existing solutions (Price, K.V., Storn, R.M., and Lampinen, J.A., 2005. Differential evolution: A practical approach to global optimization. Springer-Verlag, Berlin, Germany and Storn, R., Price, K., 1997. Differential evolution–a simple and efficient heuristic for global optimization over continuous spaces. Journal of Global Optimization 11(4) 341-359). This enables DE to follow the contours of the fitness landscape at less computational cost than other methods.

DE uses a real-valued representation and operates by combining existing solutions with weighted difference vectors formed from other solutions. By using weighted differences of existing solutions, it automatically adapts its step size and its orientation as convergence occurs, shifting from a global search (exploration) to a local search (exploitation) method (Price et al., 2005). It tends to converge to the global optimum faster (in fewer fitness function evaluations) than other real-valued approaches.

There are a number of DE variants, named according to the template DE/mutation/diffs/crossover where mutation is the method used to mutate the parent vector, diffs is the number of pairs of difference vectors used in mutation (i.e., how many differences, usually only 1 or 2), and crossover, if present, is the method used to cross the new vector with the parent vector.

Multiobjective optimization uses fast non-dominated sorting to select values on the non-dominated front. It includes several other improvements designed to speed convergence to the global optimum and give a uniformly-spaced non-dominated front. For more information, see: Chichakly, K. J., & Eppstein, M. J. (2013, July). Improving uniformity of solution spacing in bi-objective evolution. In Proceedings of the 15th annual conference companion on Genetic and evolutionary computation (pp. 87-88). ACM.

Report Interval

The report interval is the frequency with which aggregate performance information for a generation of parameter values is displayed. If this is 0 then approximately 50 value will be reported for large numbers of generations.

Mutation Type

DE has specific names for the members of the parent population used in mutation. The parent, i.e., the member of the population that will be replaced if the child is better, is known as the target vector. The member of the population that is being mutated to form the child is known as the base vector. Finally, the members used to create the difference (perturbation) for the mutation are known as the difference vectors. This implementation supports the five standard types of DE mutation:

rand: Favors exploration with the base and difference vectors randomly chosen. When combined with binomial crossover, this is known as classic DE.

best: Favors exploitation with the base vector always being the best solution from the parent population, modulated by randomly chosen difference vectors. Compared to rand, it “usually speeds convergence, reduces the odds of stagnation, and lowers the probability of success,” (Price et al., 2005, p. 73) where success refers to converging to the global optimum rather than a local one.

target-to-rand: Compromise between exploration and exploitation leaning toward exploration with the base vector always being the target vector, modulated by both randomly chosen difference vectors and the difference between a randomly chosen vector and the target vector.

target-to-bestt: Compromise between exploration and exploitation leaning toward exploitation with the base vector always being the target vector, modulated by both randomly chosen difference vectors and the difference between the best solution from the parent population and the target vector.

either-or: Alternates between DE/rand/1 and a recombinant version of DE/rand/2 based on a probability. The main motivation behind this formula is to fill discovered gaps in the mutation/recombination space (Price et al., 2005). It is considered more effective than rand when the payoff functions cannot be decomposed.

Note It is usually best to start with rand as the mutation type.

Crossover Type

Use the Crossover Type to specify how the characteristics of the next generation population are derived from the current generation.

bin: A separate binomial trial is performed for each gene, with the child gene replacing the target vector gene only if the uniformly drawn random number is below the crossover probability. This is also known as uniform crossover and is the method used in Classic DE.

exp: Using an exponential distribution and starting at a random gene, genes are copied from the child to the target as long as uniformly drawn random numbers remain below the crossover probability.

Note At least one gene is always copied with the above methods to ensure the new potential solution is not the same as the base vector.

Note Use bin unless you have a specific need for exp.

Minimization, maximization, constraints and payoff definitions

The differential evolution method allows constrained multicriteria optimization. When you activate a payoff you can set the optimizer to minimize it, maximize it, or treat it as a constraint.

Note Using negative payoff weights and maximize will give the same results as using positive payoff weights and minimize.

Generations specifies the number of generations to run the evolutionary algorithm. Keep in mind that (generations + 1)*population_size is the total number of times that your model will be run and plan accordingly. There is always a trade-off between generations, population size, and convergence.

Note Increasing generations by 10 and comparing results is a good way to test if there is more to be gained by further exploration.

Population size specifies the number of individuals to include in each generation. For single-objective problems, this can often be small, say 5 or 10. As you increase the number of objectives, you need to quickly increase this. For example, while 5 may be enough for one objective, you may need a minimum of 50 for two objectives and 500 for three objectives. This affects both convergence speed and the number of values on the final non-dominated front.

Tolerance is optional and specifies a convergence criteria that enables the optimization to terminate before the specified number of generations if the relative change in the mean between successive generations for all objectives is below this value.

Scaling factor specifies the mutation scaling factor applied to difference vectors. It is typically in the range of 0 to 1, though some problems are solved more readily with values up to 2. Smaller values favor exploration and larger values favor exploitation. The default value is typical.

Crossover prob is the crossover probability used with binomial or exponential crossover. The best results occur in the range of 0 to 0.2 or 0.9 to 1 (Price et al., 2005). Lower values favor exploitation while larger values favor exploration. The default value is a good place to start.

K/rand prob is optional and specifies either the recombination scaling factor for mutation types target-to-rand and target-to-best or the probability of selecting rand (mutation) over the recombinant version rand/2 in mutation type either-or. It is not used by any other mutation type. If not specified, the default recombination scaling factor is the same as the mutation scaling factor and the probability of choosing rand is set to 0.5. These are both good starting places, so, in general, leave this blank.

Seed is the random number seed used for the optimization. Set this if you want to be able to replicate your results at a later date. Leave it set to 0 to randomly choose a different seed each time the optimization is run.

Discrete

Discrete is a simple optimization technique that makes it easy to look over sequence of values that are integers or other quantized units. It operates on each optimization parameter in isolation, iterating through them until no improvement in the payoff can be found. It is well suited to optimizations in which the parameters are only lightly interrelated. When there is heavier interrelation using Differential Evolution, or multiple randomized starts, will likely give better results.

Max sims is the maximum number of simulations that will be made. If this number is reached the optimization will stop.

Max pass is the maximum number of passes through all the parameters (they are processed sequentially) that will be made. If Refine is set to true than this will determine the final solution increment for each parameter.

Tolerance is the amount of change in payoff (as a fraction of the payoff) considered significant.

Refine determines whether or not the increment for each parameter is decreased on each pass. When set to 2, each increment will be divided by two for each pass.

For each parameter specify the minimum value, maximum value and increment. If refine is set, the increment will be divided by 2 on each pass. There is also an option to explore beyond a single increment when the payoff is not changing more than the tolerance. These options are

Check only a single increment - when the payoff does not improve after a single step the other direction is checked and nothing done if neither yields an improvement.
Explore upward to the max - when the payoff does not change after increasing the parameter the next higher value is checked until the payoff changes or the maximum parameter value is reached. I f successful, downward search is skipped.
Explore downward to the min - when the payoff does not change after decreasing the parameter the next lower value is checked until the payoff changes or the minimum parameter value is reached. If not successful the parameter is left at its original value.
Explore both directions - uses both of the above options, first increasing then, if necessary, decreasing.
Exhaustive search on each pass - tests the parameter from the minimum to the maximum value. This is only practical if the parameter can only take on a limited number of values.